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Summary 



We consider a linear regression model, with the parameter of interest a specified 
linear combination of the regression parameter vector. We suppose that, as a first 
step, a data-based model selection (e.g. by preliminary hypothesis tests or min- 
imizing AIC) is used to select a model. It is common statistical practice to then 
construct a confidence interval for the parameter of interest based on the assumption 
that the selected model had been given to us a priori This assumption is false and 
it can lead to a confidence interval with poor coverage properties. We provide an 
easily-computed finite sample upper bound (calculated by repeated numerical eval- 
uation of a double integral) to the minimum coverage probability of this confidence 
interval. This bound applies for model selection by any of the following methods: 
minimum AIC, minimum BIC, maximum adjusted R 2 , minimum Mallows' Cp and 
t-tests. The importance of this upper bound is that it delineates general categories 
of design matrices and model selection procedures for which this confidence interval 
has poor coverage properties. This upper bound is shown to be a finite sample 
analogue of an earlier large sample upper bound due to Kabaila and Leeb. 

Key words: Adjusted i? 2 -statistic; AIC; "Best subset" regression; BIC; Mallows' 
criterion; t-tests. 
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1. Introduction 



It is very common in applied statistics that the model initially proposed is rela- 
tively complicated. The standard statistical methodology for simplifying a compli- 
cated model is to carry out a preliminary data-based model selection by, for example, 
using preliminary hypothesis tests or minimizing AIC. This is usually followed by 
the inference of interest, using the same data, based on the assumption that the 
selected model had been given to us a priori. This assumption is false and it can 
lead to an inaccurate and misleading inference. In one particular context, Breiman 
(1992) has called this "a quiet scandal in the statistical community". Nonetheless, 
this type of inference is taught extensively in university courses and is applied widely 
in practice. It is therefore important to ascertain the extent to which this type of 
inference is inaccurate and misleading. 

Consider the important case that the inference of interest is either a confidence 
interval or a confidence region. A confidence interval (region) with nominal coverage 
1 — a that is constructed after preliminary model selection, using the same data and 
based on the (false) assumption that the selected model had been given to us a 
priori, will be called a 'naive' 1 — a confidence interval (region). The literature 
on the coverage properties of naive confidence intervals and regions is relatively 
recent. Regal & Hook (1991) provide an example of a log-linear model, parameters 
and model selection procedure for which the coverage probability of the naive 0.95 
confidence interval is far below 0.95. Hurvich & Tsai (1990) provide examples of 
a linear regression model, parameters and model selection procedures for which 
the naive 0.9, 0.95 and 0.99 confidence regions for the regression parameter vector 
have coverages far below 0.9, 0.95 and 0.99 respectively. These authors do not seek 
to provide a comprehensive analysis of the coverage probability functions of the 
confidence intervals or regions they consider. Arabatzis et al. (1989), Chiou & Han 
(1995a, b), Chiou (1997) and Han (1998) find the minimum coverage probabilities of 
naive confidence intervals in the contexts of some simple models and simple model 
selection procedures. The minimum coverage probability of the naive confidence 
interval can be calculated for simple model selection procedures in linear regression 
involving only a single variable (Kabaila (1998)). The kinds of model selection 
procedures used in practice in linear regression are typically much more complicated. 
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For the real-life example considered by Kabaila (2005), there are 20 variables each 
of which is to be either included or not, leading to a choice from among 2 20 different 
models. In more complicated situations such as these, Kabaila (2005), Kabaila 
& Leeb (2006, Section 3) and Giri & Kabaila (2007) use Monte Carlo simulation 
methods to assess the minimum coverage probability of the naive confidence interval, 
in the context of linear regression models. A model selection procedure is said 
to be 'consistent' if, for any fixed model parameters and sample size — > oo, the 
true order of the model is consistently estimated. Minimization of BIC is such a 
procedure. Kabaila (1995) and Leeb & Potscher (2005) are concerned with dispelling 
the misconception that naive confidence intervals and regions, constructed after a 
consistent preliminary model selection, will have good coverage properties provided 
that the sample size is sufficiently large. 

Whilst this literature provides examples of the poor coverage performance of 
naive confidence intervals, it may still be asked whether these examples are merely 
oddities or whether they are indicative of a more widespread phenomenon. The way 
to answer this question is by delineating general categories of models and model 
selection procedures for which the naive confidence interval has poor coverage prop- 
erties. The aim of the present paper is to make a contribution to such a delineation 
in the context of the complicated type of model selection procedures used in practice 
for the linear regression model 

Y = Xf3 + e 

where Y is a random n-vector of responses, X is a known n x p matrix with linearly 
independent columns, (3 is an unknown parameter j9-vector and e ~ N(0, a 2 I n ) 
where a 2 is an unknown positive parameter. Suppose that the quantity of interest 
is 9 = a T f3 where a is a known p- vector (a ^ 0). Our aim is to find a confidence 
interval for 9 with minimum coverage probability a pre-specified value 1 — a, based 
on an observation of Y. 

We suppose that, as a first step, a data-based model selection is used to select a 
model. Specifically, suppose that the model selection procedure is used to either set 
Pi equal to or allow it to vary freely for each % — q + 1, . . . ,p (q > 1). We consider 
a confidence interval for 9 with nominal coverage 1 — a constructed on the (false) 
assumption that the selected model had been given to us a priori. This is the naive 
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1—a confidence interval for 9. Let O, P q +i, ■ ■ ■ , P P denote the least squares estimators 
of 9, Pq+i, . . . , P P respectively. Let Corr(0, Pj) denote the correlation between and 
Pj. Assume, without loss of generality, that |Corr(0, Pj)\ is maximized with respect 
to j E • • • ,p at j = p. We use p to denote the important parameter Corr(0, j3 p ). 

We call a model selection procedure 'conservative' when it is not consistent but, 
for any fixed model parameters, the probability of choosing only correct models 
converges to 1 as the sample size — > oo. Kabaila & Leeb (2006) provide an easily- 
computed large sample upper bound (calculated by repeated numerical evaluation of 
a single integral) to the minimum coverage probability of this confidence interval for 
conservative model selection procedures. Minimization of AIC is such a procedure. 
Consider the case that a conservative model selection procedure is used. The large 
sample upper bound of Kabaila & Leeb (2006) is a continuous decreasing function 
of \p\, which approaches as \p\ approaches 1 from below. This result tells us is 
that for large samples, the naive 1 — a confidence interval has minimum coverage 
probability far below 1 — a when \p\ is close to 1. The importance of this result 
is that it delineates general categories of design matrices X and model selection 
procedures for which the naive confidence interval has poor coverage properties in 
large samples. 

In the present paper we provide an easily-computed finite sample analogue (cal- 
culated by repeated numerical evaluation of a double integral) of the large sample 
upper bound of Kabaila & Leeb (2006). This finite sample upper bound applies 
to a wide range of model selection procedures, and is not restricted to conservative 
ones. For conservative model selection procedures the large sample upper bound 
complements the finite sample bound nicely. We suppose that the model selection 
is based on one of the following methods: (a) minimum AIC, (b) minimum BIC, 
(c) maximum adjusted _R 2 -statistic, (d) minimum Mallows' Cp and (e) for each 
j £ {g + l,...,p} a t-test of the null hypothesis H j : Pj — against the alternative 
hypothesis H^j '■ Pj ^ 0. We provide a method for obtaining an upper bound on 
the minimum coverage probability of the naive confidence interval as follows. 

For convenience, we introduce the following terminology. If the model selection 
procedure is (hypothetically) used to either set Pi equal to or allow it to vary freely 
for each i e L, where L is a proper subset of {q + 1, . . . ,p}, then we say that "the 
model selection procedure is applied only to Pi G V . The following result is proved 
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in section 2. For each given i satisfying q < £ < p, the minimum coverage probability 
of the naive 1 — a confidence interval is bounded above by the coverage probability of 
the naive 1 — a confidence interval for given -(/3g +1 , . . . , (3 p ) and the model selection 
procedure applied only to (3g+i, . . . , (3 p . Therefore, the minimum coverage probability 
of the naive confidence interval is bounded above by the coverage probability of the 
naive 1 — a confidence interval for given ^f3 p and the model selection applied only 
to Pp. In Section 3 we derive an easily-computed expression for this upper bound 
for given \fi p - This expression is easily minimized numerically with respect to ^fl p 
to obtain the value of the finite sample upper bound on the minimum coverage 
probability of the naive 1 — a confidence interval. 

This upper bound is a continuous decreasing function of \p\. Some illustrative 
numerical evaluations of this upper bound are presented in Section 4. See, for 
example, Figure 1 which is a plot of this upper bound as a function of \p\ for model 
selection by minimizing Mallows' C P , with m = n — p = 5, 20, 50, 1000 and oo (i.e. 
the large sample upper bound of Kabaila & Leeb (2006)). The new finite sample 
upper bound tells us that the naive 1 — a confidence interval has minimum coverage 
probability far below 1 — a when \p\ is close to 1. The importance of this result 
is that it delineates a general category of design matrices X and model selection 
procedures for which the naive confidence interval has poor coverage properties in 
finite samples. 



2. Two important preliminary results 

Suppose that the model selection procedure is used to either set $ equal to 
or allow it to vary freely for each i = q + \,...,p (q > 1). Let K, denote the 
family of all subsets of {q + 1, . . . ,p}, including the empty set 0. We use K to 
denote the element of K, chosen by the model selection procedure. Let f3 denote the 
least-squares estimator of (5. Let RSS denote the following residual sum of squares, 

RSS = (Y — XP) T (Y - X$). 

Let K be a fixed subset of {q + 1, . . . ,p} and suppose that $ is set equal to zero 
for each % e K and is freely- varying for each % £ K. Let \K\ denote the number of 
elements in K. Also let H K denote the \K\ x p matrix whose ith row consists of 
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zeros except for the j'th element which is 1 where j is the ith ordered element of 
K. Thus Hk/3 = 0. Let (3k denote the least-squares estimator of (3 subject to this 
restriction. Also let RSSk denote the residual sum of squares 

RSS K = (Y- Xp K ) T {Y - Xfa), 

and S K = RSSk/{u — p + \K\). The standard 1 — a confidence interval for 9, 
assuming that H K (3 = 0, is 

I{K)= [a T p K -d K ,a T p K + d K ] 

where d K = t(n - p + \K\)S K ^v(K), t(m) is defined by P(-t(m) < T < t(m)) = 
1 — a for T ~ t m and v(K) is defined to be (variance of a 1 (3k) /v 2 . 
We consider the following 4 methods of model selection. 

Method 1 (minimizing an AlC-like criterion) 
K minimizes 

AIC(K) =nln{RSS K ) + 2{p-\K\)f{n) 

with respect to K G IC. Here, f(n) is 1 for AIC and | ln(n) for BIC. 

Method 2 (minimizing Mallows' Cp) 
K minimizes 

RSSk n r i r^i\ 

with respect to K G /C. 

Method 3 (maximizing adjusted R 2 ) 
K minimizes 

RSSk 

% 



n — p + | if | 
with respect to K G K.. 

Method 4 (t-tests) 

K consists of the set of j G {q + 1, . . . ,p} for which a t-test of the null hypothesis 
H j : f3j — against the alternative hypothesis iJ^j : /3j 7^ leads to acceptance of 
H 0j . 

The naive 1 — a confidence interval for 9 is the interval I(K). 
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Suppose that the integer i satisfies q + 1 < £ < p. Let /C* denote the family of 
all subsets of {£+ 1, . . . ,p}, including the empty set 0. The following theorem paves 
the way for Theorem 2 which is the main result of this section. 

Theorem 1. Consider the following 4 cases. 

Case 1 K* minimizes AIC(K) with respect to K G K* . 

Case 2 K* minimizes C K with respect to K E /C*. 

Case 3 K* minimizes B K with respect to K E K,*. 

Case 4 K* consists of the set of j G {q + 1, . . . ,p} for which a t-test of the null 
hypothesis H j : fij — against the alternative hypothesis Haj ■ (3j ^ leads to 
acceptance of i?oj- 

For each of these cases, the coverage probability of the confidence interval I(K*) is 
a function of ^(Pe+i, ■■■,P P )- 

This theorem is proved in Appendix A. 

It is intuitively plausible that the wider the class of models that one selects 
from using a given model selection procedure, the smaller is the minimum coverage 
probability of the naive 1 — a confidence interval. The following theorem formalizes 
this plausible result. We will use this theorem in Section 3 to derive an easily- 
computed finite sample upper bound on the minimum coverage probability of the 
naive 1 — a confidence interval. 

Theorem 2. Consider the following 4 cases. 

Case 1 K minimizes AIC(K) with respect to K e JC. 

K* minimizes AIC(K) with respect to K e /C*. 

Case 2 K minimizes Ck with respect to K e /C. 

K* minimizes Ck with respect to K e KL* . 

Case 3 K minimizes Bk with respect to K e /C. 

X* minimizes with respect to X £ K*. 

Case 4 K consists of the set of j G {q + 1, . . . ,p} for which a t-test of the null 
hypothesis H j : f3j — against the alternative hypothesis if^- : f3j ^ leads to 
acceptance of H j. K* consists of the set of j G {£ + 1, . . . ,p} for which a t-test 
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of the null hypothesis H j : /3j — against the alternative hypothesis Haj : [3j ^ 
leads to acceptance of H j. 

For each of these cases, the minimum coverage probability of the naive 1 — a confi- 



This theorem is proved in Appendix B. 

3. An easily-computed finite sample upper bound on the 
minimum coverage probability of the naive confidence 

interval 

In this section we present an easily-computed finite sample upper bound on the 
minimum coverage probability of the naive 1 — a confidence interval. Theorem 2 
implies that (for each of the methods considered) this minimum coverage probability 
is bounded above by the coverage probability of the naive 1 — a confidence interval 
for given -(5 P and the model selection procedure applied only to f3 p . Theorem 3 
provides an easily-computed expression for the latter coverage probability. This 
expression is easily minimized numerically with respect to ^(3 P to obtain the value 
of the finite sample upper bound on the minimum coverage probability of the naive 
1 — a confidence interval. 

Define the matrix V to be the covariance matrix of (0,/3 p ) divided by a 2 . Let 
Vij denote the th element of V. Also define the random variable 



dence interval I(K) is bounded above by the coverage probability of I(K*) for each 
given i(/3 w ,...4)eRH 




and the parameter 




The random variable W has the same distribution as \/Q/(n — p) where Q ~ x 2 n 



We have defined p = Corr(6,/3 p ), so that p = v 12 /^/v n v 2 2- Define the functions 



£i(w) = —t(n — p)w 
Ui(w) = t(n — p)w 
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£ 2 (h,w, p)=ph-t(n- P + l)J yn P ^+ /l2 v ^7 

n — p + 1 



,, . , , 1N (n — p)w 2 + h 2 r 

u 2 {h,w,p) = ph + t(n-p + l)W — a/ 1 - P 

y n — p + 1 

Now define the functions 

fc+(/i, iw, 7, P) = * (4H, «iH; p(/i - 7), 1 - p 2 ) 
fc(/i, w, 7, p) = * u>, p), ua(A, to, p); p(h - 7), 1 - p 2 ) 

where \l/(a;, ?/; /x, 11) = P(x < Z < y) for Z ~ N(p, v). Also define 

T = /3 P / ( a/ RSS/ (n — p) y 7 ^) . We use these definitions in the statement of the 

following theorem. 

Theorem 3. Suppose that K* = {{p},0}- Consider the following 4 cases. 
Case 1 if* minimizes AIC(K) with respect to if G K*. Define 



^/(e*p(^)-l)(n-p). 



Case 2 if* minimizes Ck with respect to if G /C*. Define d = \/2. 
Case 3 if* minimizes Bk with respect to if G /C*. Define d = 1. 
Case 4 If |T| > ti then if* = 0; otherwise if* = {p}. 

In each of these 4 cases, the coverage probability of the confidence interval i(if *) is 
an even function of 7 and is equal to 

poo pd 

(! — «)+/ / (k(wx,w, / y,p) — k Ji (wx,w 1 'y,p))(J)(wx — 'y)wfw(uj)dxdw (1) 



'0 J-d 

where <fi denotes the iV(0, 1) probability density function and fw denotes the prob- 
ability density function of W. For given 7, ([1]) is an even function of p. 

This theorem is proved in Appendix C. It has the following corollary. 

Corollary 1. Consider the 4 cases described in Theorem 3. In each of these 4 cases, 
the minimum coverage probability of the naive 1 — a confidence interval is bounded 
above by the minimum over 7 > of (HI). 
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That this corollary is a finite sample analogue of Theorem 1 of Kabaila & Leeb 
(2006) is confirmed as follows. The following are conservative model selection pro- 
cedures: minimizing AIC, minimizing Mallows' Cp and maximizing adjusted R 2 . 
Define d! = Vz for model selection by minimizing AIC and by minimizing Mallows' 
Cp. Also define d! = 1 for model selection by maximizing adjusted R 2 . Define z by 
P(-z < Z < z) = 1 - a for Z ~ N(0, 1). Also define A(a, b) = $(a + b) - $(a - b) 
for all a, b G M, where $ denotes the iV(0, 1) distribution function. Consider p and 
p fixed and n — > oo. Now t(n — p) — > 2; as n — ► 00. For model selection using AIC, 
<i — > c/' as n — > 00. It may be shown that, for each of these conservative model 
selection procedures, ([I]) converges to 

a 1 t '\ .~ w^h-^dh 



a + 




1 - a + A 



P7 



=,« A( T ,tf) 



A I p( - k 7) — i n(/, - -.),//, 



(2) 



uniformly in 7 as n — > 00. Now 

p(h-l) z 



A 



<j)(h- 1 )dh = P(-z<A<z,-d'<B<d') 



where 



A 




"0" 




"1 p" 


B 


.7. 






3- 


7. Thus 






B 


~ iV ( 


"0" 




"1 p" 


A 


7. 







and so 



P( - z < A < z, -d' < B < d') = P( - d' < A < d', -z < B < z, ) 

7 + ph d' 



A 



4>{h) dh 



Thus 02]) is equal to (4) of Kabaila & Leeb (2006). This shows that the finite sample 
upper bound stated in Corollary 1 converges to the large sample upper bound (3) 
of Kabaila & Leeb (2006) as n — ► 00. 
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The following result provides an explicit formula for the upper bound described 
in Corollary 1 for the particular case that p — 1. The proof of this result is omitted 
for the sake of brevity. 

Theorem 4. Suppose that p = 1. Let d be as defined in the statement of Theorem 
3. The upper bound, described in Corollary 1, to the minimum coverage probability 
of the naive 1 — a confidence interval is 

2 / - p)w) - §(dw))f w (w)dw 

Jo 

when d < t(n — p) and is when d > t(n — p). 

4. Numerical illustrations 

The integrand of the double integral in ([1]) is a smooth function of (x, w) and so it 
is easily computed numerically. Let m = n—p and remember that p = Corr(G), /3 P ), 
where p maximizes |Corr(0, (5j) | with respect to j G {q+ 1, . . . , p}. For given p, m, a 
and p, we minimize ([1]) numerically with respect to 7 > to obtain the upper bound 
(described in Corollary 1) to the minimum coverage probability of the naive 1 — a 
confidence interval I{K). The following are conservative model selection procedures: 
minimizing Mallows' Cp, maximizing adjusted R 2 and minimizing AIC. For the 
numerical illustrations for these procedures described in this section we include the 
case m = 00. For this case, we use the large sample upper bound to the minimum 
coverage probability of the naive 1 — a confidence interval derived by Kabaila & 
Leeb (2006). Programs for computing these upper bounds have been written in 
MATLAB (including the use of the Optimization and Statistics toolboxes). 

For model selection by minimizing Mallows' Cp or maximizing adjusted R 2 , 
d is a fixed number that does not depend on either p or m. In this case, the 
upper bound (described in Corollary 1) to the minimum coverage probability of 
the naive 1 — a confidence interval is, for given \p\, a function of m. Plots of 
this upper bound as a function of \p\, for model selection by minimizing Mallows' 
Cp and by maximizing adjusted R 2 , were prepared for a G {0.1,0.05,0.02} and 
m = 1,2, 3, 4, 5, 10, 20, 50, 1000 and 00. For each value of a and m considered, this 
upper bound was found to be a continuous decreasing function of \p\ that is far 
below 1 — a when \p\ is close to 1. This finding is illustrated by Figures 1 and 2. 
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Figure 1 is a plot of this upper bound as a function of \p\ for model selection by 
minimizing Mallows' Cp and for m = 5,20,50, 1000 and oo. Figure 2 is a plot of 
this upper bound as a function of \p\ for model selection by maximizing adjusted R 2 
and for m — 5, 20, 50, 1000 and oo. 

Now consider model selection using AIC. When n is large and p is small com- 
pared to n, d is approximately equal to \[2 and the upper bound described by 
Corollary 1 is approximately equal to this upper bound for model selection by min- 
imizing Mallows' Cp. Plots of this upper bound as a function of \p\, for model 
selection by minimizing AIC, were prepared for a = 0.05, p G {2,3,4,7,10} and 
m — 1,2, 3, 4, 5, 10, 20, 50, 1000 and oo. For each value of p and m considered, this 
upper bound was found to be a continuous decreasing function of \p\ that is far 
below 1 — a when \p\ is close to 1. This finding is illustrated by Figure 3. This 
figure is a plot of the upper bound described by Corollary 1 as a function of \p\ for 
model selection by minimizing AIC, for a = 0.05, p — 10 and m = 5, 20, 50, 1000 
and oo. For the real life data example considered by Kabaila & Leeb (2006, section 
3), p = 10 and m = 20. 

Finally, consider model selection using BIC. Since this model selection procedure 
is consistent, the large sample upper bound to the minimum coverage probability of 
the naive I— a confidence interval, derived by Kabaila & Leeb (2006), does not apply. 
Plots of this upper bound as a function of for model selection by minimizing BIC, 
were prepared for a = 0.05, p G {2, 3, 4, 7, 10} and m = 1,2, 3, 4, 5, 10, 20, 50, 1000 
and 10, 000. For each value of p and m considered, this upper bound was found to 
be a continuous decreasing function of \p\ that is far below 1 — a when \p\ is close to 
I. This finding is illustrated by Figure 4. This figure is a plot of the upper bound 
described by Corollary 1 as a function of \p\ for model selection by minimizing BIC, 
for a = 0.05, p = 10 and m = 5, 20, 50, 1000 and 10, 000. 
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Figure 1: Plot of the upper bound, stated in Corollary 1, on the coverage proba- 
bility of the naive 95% confidence interval against \p\ when model selection is by 
minimization of Mallows' Cp. Here m = n — p = 5, 20, 50, 1000 and oo. 
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Figure 2: Plot of the upper bound, stated in Corollary 1, on the coverage proba- 
bility of the naive 95% confidence interval against \p\ when model selection is by 
maximization of adjusted R 2 . Here m = n — p = 5, 20, 50, 1000 and oo. 
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Figure 3: Plot of the upper bound, stated in Corollary 1, on the coverage proba- 
bility of the naive 95% confidence interval against \p\ when model selection is by 
minimization of AIC. Here p — 10 and m = n — p = 5, 20, 50, 1000 and oo. 
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Figure 4: Plot of the upper bound, stated in Corollary 1, on the coverage proba- 
bility of the naive 95% confidence interval against \p\ when model selection is by 
minimization of BIC. Here p — 10 and m = n — p = 5, 20, 50, 1000 and 10, 000. 
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5. Conclusion 



For a given design matrix X and a wide variety of model selection procedures, 
the efficient Monte Carlo simulation methods of Kabaila (2005) and Giri & Kabaila 
(2007) provide valuable information about the minimum coverage probability of the 
naive 1 — a confidence interval. What is also of interest, however, is to delineate 
general categories of design matrices X and model selection procedures for which 
this confidence interval has poor coverage properties. The first such delineation, for 
the complicated kinds of model selection procedures used in practice, results from 
the upper bound on the minimum coverage probability of this confidence interval 
due to Kabaila & Leeb (2006). This upper bound, however, is valid only in large 
samples and applies only to conservative model selection procedures. The present 
paper presents a finite sample analogue of this upper bound that is applicable to a 
wide variety of model selection procedures and provides a delineation that is valid 
for finite samples. 



Appendix A: Proof of Theorem 1 

In this appendix we prove Theorem 1. The proof is in 2 parts. 
Part 1 For each of the Cases 1-4, K* is determined by the following set of random 



variables 



By Theorem 1(c) and the proof of Theorem 1(e) of Kabaila (2005), in each of these 
cases, K* is determined by 



where the random vector rj" is defined by Kabaila (2005, p. 552). 

Part 2 It follows from Part 1 and the proof of Theorem 1(f) of Kabaila (2005) that, 
in each of the 4 cases, P(0 G I(K*)} is a function of ^((3 e+ i, . . . , f3 p ). 



16 



Appendix B: Proof of Theorem 2 



In this appendix we prove Theorem 2. The proof is in 2 parts. 
Part 1 Suppose that It is well-known (see e.g. Graybill (1976, p. 222)) that 

RSS K = RSS + {H K f3f(H K {X T X)~ 1 Hiy 1 H K (3. 



Thus 



where 



RSSk RSS 

+ V K . 



a 2 a 2 



By a well-known result (see e.g. Graybill (1976, p. 127)), Vk has a noncentral chi- 
squared distribution with degrees of freedom \K\ and noncentrality parameter A = 
^(H K -/3) T [H K (X T X)~ 1 H]pj 1 H K ^f3 (in the notation for noncentral chi-squared 
distributions used by Graybill (1976)). 

In Cases 1-3, express K in terms of the following set of random variables 



ffl" {*<***} 



In Case 4, express K in terms of the following set of random variables 

[-^r^{v {j} :je{ q + i,..., P }}. 

Note that RSS/ a 2 and Vk are independent random variables and RSS/ a 2 ~ Xn-p- 
Part 2 Fix . . . , Choose | = • • • = \ ^Pe\ and consider \^P q+ i\ = 

• • • — \^Pe\ — ► oo. Define J" to be the family of sets that belong to K, and include 
at least one element of {q + 1, . . . , £}. 

(a) Using the expression for K found in Part 1, it may be shown that for each of 
the 4 cases and for each K e J, 

P(K = K) -> 

as = • • • = \^Pe\ — > oo. For example, for Case 1 minimizing AIC(K) with 

respect to K G K is equivalent to minimizing 

IC{K) = nln + ^ + 2(p - |tf|)/(n) 
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with respect to K G /C. Thus, for each K £ J , 

P{K — K) < P(IC(K) < JC(0)) -> 

as = • • • = \^Pe\ — > °°- Hence, in each of the 4 cases, P(K G J") -> as 

U/WI = --- = l£/%|->oo. 

(b) Observe that the minimum value of P(0 G I(K)) is bounded above by 

p(0 g /(£)) = p (y KeJC ({0 g /(X)} n = a-})) + p (iw ({e g /(X)} n {k = #})) 

< P (V eJC ({0 G /(X)} n {K = X})) + P(K G J) 

< P (U^c ({0 G /(X)} n {K* = K})) + P(K G J) 
= P(0 G I(K*)) + P(K G J") 

since = K} C {If* = if} for each if G JP. By choosing \l(3 q+ i\ = ■■■ = 
\^(3e\ — > oo, we see that the minimum value of P(0 G I{K)) is bounded above by 
P(0 G /(if*)). 



Appendix C: Proof of Theorem 3 



In this appendix we prove Theorem 3. Define the random variables 



G = 



e-e 



and H = 



Note that T = H/W. In each of the 4 cases, if* = if \T\ > d and if* = {p} 
otherwise. It is straightforward to show that the confidence interval i(if*) for 9 is 



6 - t(n -P)Vvii\lz — I> © + H n — PJv^ir 



n — p 



n — p 



if |T| > d and 



v 2 2 V n-p+1 \J v 22 



© /3 P + 1)\/ fr: \ v u 
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otherwise. Note that 



It may be shown that the coverage probability of I(K*) is equal to 



G~ 


~ N ( 


"0" 




"1 


P 


) 


H 


X 




P 


1 





(C.l) 



P {h(W) <G< Ul (w)}n 



\H\ 

w~ 



> d 



+ P[ {£ 2 (H,W,p) <G< u 2 {H,W,p)} n { < d 



\H\ 



(C.2) 



Remember that £\, u\, £ 2 and u 2 are defined at the start of Section 3. Using the 
fact that 



~-G' 


~ N ^ 


" " 




"1 




) 


-H 






P 


1 





it may be shown that (1C 2|) is an even function of 7. 

The random vectors (G, H) and W are independent. It follows from (IC.lj) that 
the probability density function of H, evaluated at h, is <p(h — 7). Thus 



P [ {£i(W) <G< ui(W)} n 

i-ui (w) 



\H\ 

w 



> d 



fG\H(g\h)dg(j)(h->y)dhf w (w)dw (C.3) 

'0 J{\h\>dw} Jh{w) 

where fG\ii(g\h) denotes the probability density function of G conditional on H = h, 
evaluated at g. By (IC.lj) . the probability distribution of G conditional on H = h is 
N(p(h - 7), 1 - p 2 ). It follows that (j03j) is equal 



k'(h, w, 7, p) (f)(h — 7) fw{ w ) dh dw. 



(C.4) 



'0 J{\h\>dw} 

The standard 1 — a confidence interval 7(0) for 9 has coverage probability 1 — a, so 
that 1 - a = P(£i(W) < G < m(W)). Thus (J03D is equal to 

"OO rdw 



[l-a) 



k^(h, w, 7, p) (f)(h — 7) fw( w ) dh dw. 



J -dw 



Similarly, 



P [ {£ 2 (H,W,p) <G< u 2 (H,W,p)} n { S < d 



00 />c£ui 



w, 7, p) — 7) fw(w) dh dw. 



J —dw 
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Hence, P(9 G I{K)) is equal to 

rco rdw 

(1 — a) + / / (k(h,w,j, p) — k\h,w,j, p)) <j){h — 7) /w(iu) dhdw. 

JO J~dw 

The result follows by changing the variable of integration in the inner integral from 
h to x = h/w. 

That, for given 7, (JT]) is an even function of p follows from the fact that <&(&) — $(a) = 
$(-a) - $(-6) for all a, 6 G R. 
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